Skip to content

Conversation

@i-am-sijia
Copy link
Contributor

@i-am-sijia i-am-sijia commented Nov 20, 2025

This PR addresses issue #892.

Status: Draft - core functionality implemented, remaining: logging and documentation.

Summary

Adds a new skip_failed_choices setting that allows ActivitySim to skip households that cause choice model failures instead of being masked and forced a choice by the overflow_pretection or crashing the entire simulation. This improves robustness for large-scale production runs where occasional edge cases shouldn't halt the model.

Key Changes

Core Logic (activitysim/core/)

  • Modified logit.py to temporarily mark failed choices with -99 and track skipped households in state
  • Added functionality in state.py to dynamically drop data that belong to the skipped household
  • Updated simulate.py and interaction_sample_simulate.py to handle failed choices
  • Enhanced report_bad_choices() to log skipped households

Configuration

  • Added skip_failed_choices: bool = True setting in top.py

Model Updates (activitysim/abm/models/)

  • Retained household_id in chooser columns across location choice models for tracking
  • Modified shadow price calculations to exclude skipped households
  • Updated trip matrices to adjust household expansion weights: weight * (valid_hh / total_hh)
  • Added null trip mode handling in trip mode choice and school escorting

Testing

  • Added new unit test test/skip_failed_choices to deliberately trigger failures and validates households being skipped

Usage

# settings.yaml
skip_failed_choices: true

Access skipped household info via:

  • state.get("num_skipped_households", 0)
  • state.get("skipped_household_ids", set())

Final count logged at end of simulation in the activitysim.log.

Notes

@jpn-- jpn-- added this to Phase 11 Dec 2, 2025
@jpn-- jpn-- moved this to In Progress in Phase 11 Dec 2, 2025
@i-am-sijia
Copy link
Contributor Author

For components where we intentionally allow some choices to fail without being skipped, the allow_zero_probs option is already set to True, which exempts them from this new feature. Therefore, it may not be necessary to introduce this as a component-level setting. In addition, keeping it as a global setting helps reduce manual errors, avoid confusion, and prevent inconsistencies across the implementation.

@jpn-- jpn-- moved this from In Progress to Under Review in Phase 11 Dec 11, 2025
@jpn-- jpn-- self-requested a review December 11, 2025 01:03
@jpn-- jpn-- requested a review from Copilot January 8, 2026 17:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new skip_failed_choices feature that allows ActivitySim to skip households causing choice model failures rather than crashing the simulation or forcing a choice through overflow protection. This is designed to improve robustness in large-scale production runs where occasional edge cases shouldn't halt the entire model.

Key Changes:

  • Core logic modified to mark failed choices with -99, track skipped household IDs, and dynamically remove skipped household data from state tables
  • New skip_failed_choices setting added (defaults to True) with overflow protection disabled when active
  • Model updates to handle null modes, adjust trip matrices weights, exclude skipped households from shadow pricing, and retain household_id for tracking

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
activitysim/core/logit.py Adds logic to mark failed choices with -99, track skipped households in state, and conditionally skip vs. raise errors
activitysim/core/workflow/state.py Implements update_table() to drop rows belonging to skipped households from all state tables
activitysim/core/simulate.py Updates eval_nl to pass skip_failed_choices flag to report_bad_choices, adds trace_choosers parameter
activitysim/core/interaction_sample_simulate.py Clips positions to 0 when skip_failed_choices enabled to prevent index errors from -99 values
activitysim/core/util.py Always includes household_id in columns for tracking skipped households
activitysim/core/configuration/top.py Adds skip_failed_choices boolean setting (defaults to True)
activitysim/abm/models/location_choice.py Filters skipped households from shadow price calculations, adds household_id retention
activitysim/abm/models/trip_matrices.py Adjusts household expansion weights for skipped households, logs final skip count
activitysim/abm/models/trip_mode_choice.py Handles null trip modes from failed choices, filters skipped households from merged trips
activitysim/abm/models/school_escorting.py Adds household_id to chooser columns
activitysim/abm/models/util/tour_*.py Adds household_id to chooser columns in tour scheduling, OD, and destination functions
activitysim/abm/models/util/school_escort_tours_trips.py Allows null trip modes when skip_failed_choices enabled
test/skip_failed_choices/test_skip_failed_choices.py New test that deliberately triggers failures and validates skipping behavior
.github/workflows/core_tests.yml Adds new skip_failed_choices test to CI workflow

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

)

# trip_mode can be na if the run allows skipping failed choices and the trip mode choice has failed
if state.settings.skip_failed_choices:
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The early return on line 1048 bypasses the assertion that verifies all trip modes are non-null. While this is intentional when skipping failed choices, it means trips with null modes will be returned without any logging or tracking. Consider logging a warning about the null trip modes before returning to maintain visibility into the data quality.

Suggested change
if state.settings.skip_failed_choices:
if state.settings.skip_failed_choices:
null_modes = trips.trip_mode.isna()
if null_modes.any():
null_count = int(null_modes.sum())
# log a limited sample of trip_ids with null modes to avoid excessive log size
sample_ids = trips[null_modes].index.to_list()[:10]
logger.warning(
"Returning trips with %d missing trip_mode values due to skip_failed_choices; "
"example trip_ids with null modes: %s",
null_count,
sample_ids,
)

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +102
else:
continue
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else: continue statement is redundant because the loop would naturally continue to the next iteration anyway. This can be removed to simplify the code.

Suggested change
else:
continue

Copilot uses AI. Check for mistakes.
Comment on lines +362 to +370
if state.settings.skip_failed_choices is not None:
skip_failed_choices = state.settings.skip_failed_choices

if bad_probs.any() and not allow_bad_probs:
report_bad_choices(
state,
bad_probs,
probs,
skip_failed_choices,
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When skip_failed_choices is False or not set, the code will use an uninitialized skip_failed_choices variable on line 370, causing a NameError. The variable should be initialized before the conditional check or have a default value.

Copilot uses AI. Check for mistakes.
Comment on lines +1490 to +1498
if state.settings.skip_failed_choices is not None:
skip_failed_choices = state.settings.skip_failed_choices

if no_choices.any():
logit.report_bad_choices(
state,
no_choices,
base_probabilities,
skip_failed_choices,
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When skip_failed_choices is False, None, or not set in settings, the code will use an uninitialized skip_failed_choices variable on line 1498, causing a NameError. The variable should be initialized with a default value before the conditional check.

Copilot uses AI. Check for mistakes.
Comment on lines +56 to +95
@pytest.fixture
def state():
configs_dir = new_configs_dir
output_dir = dir_test_path("output")
data_dir = example_path("data")

# turn the global setting on to skip failed choices
update_settings(new_settings_file, "skip_failed_choices", True)

# make some choices fail by setting extreme coefficients in the uec
# auto ownership
auto_ownership_uec_file = os.path.join(new_configs_dir, "auto_ownership.csv")
# forcing households in home zone 8 (recoded 7) to fail auto ownership choice
update_uec_csv(auto_ownership_uec_file, "@df.home_zone_id==7", -999.0)

# work location choice
work_location_choice_uec_file = os.path.join(
new_configs_dir, "workplace_location.csv"
)
# forcing workers from home zone 18 to fail work location choice
# as if there is a network connection problem for zone 18
update_uec_csv(work_location_choice_uec_file, "@df.home_zone_id==18", -999.0)

# trip mode choice
trip_mode_choice_uec_file = os.path.join(new_configs_dir, "trip_mode_choice.csv")
# forcing trips on drivealone tours to fail trip mode choice
update_uec_csv(trip_mode_choice_uec_file, "@df.tour_mode=='DRIVEALONEFREE'", -999.0)

state = workflow.State.make_default(
configs_dir=configs_dir,
output_dir=output_dir,
data_dir=data_dir,
)

from activitysim.abm.tables.skims import network_los_preload

state.get(network_los_preload)

state.logging.config_logger()
return state
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test fixture modifies shared configuration files in new_configs_dir, which is reused across test runs. This can cause test pollution where one test run affects subsequent test runs. The fixture should create test-specific configuration directories or properly reset configurations after each test.

Copilot uses AI. Check for mistakes.
Comment on lines +942 to +945
# TODO: do not update tables if no new households were skipped
# right now it is updating every time which is inefficient
if self.get("num_skipped_households", 0) > 0:
self.update_table()
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update_table method is called on every add_table call when there are skipped households, which could lead to severe performance issues. The method iterates through all existing tables and modifies them, even if they have already been updated. This should track which tables have been updated to avoid redundant processing, or use a more efficient approach like batching updates.

Copilot uses AI. Check for mistakes.
Comment on lines +686 to +687
# always keep household_id
unique_variables_in_spec.add("household_id")
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always adding household_id to the columns could cause issues if household_id is not present in the original chooser table. This should check if household_id exists in the choosers columns before adding it to the set.

Copilot uses AI. Check for mistakes.
state.run(models=state.settings.models, resume_after=None)

# check that the number of skipped households is recorded in state
assert state.get("num_skipped_households", 0) == 943
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded assertion value of 943 skipped households is fragile and will break if the test data changes or if the model behavior changes slightly. Consider using a range check or verifying that the number is greater than zero instead of an exact match.

Suggested change
assert state.get("num_skipped_households", 0) == 943
assert state.get("num_skipped_households", 0) > 0

Copilot uses AI. Check for mistakes.
final_persons_df = state.get_dataframe("persons")
assert not any(
(final_persons_df["home_zone_id"] == 18)
& (final_persons_df["is_worker"] == True)
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use is True comparison for boolean values instead of == True. The explicit comparison with == True is redundant and not following Python best practices.

Suggested change
& (final_persons_df["is_worker"] == True)
& (final_persons_df["is_worker"])

Copilot uses AI. Check for mistakes.
if self.get("num_skipped_households", 0) > 0:
self.update_table()

def update_table(self, name: str = None):
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name parameter in the update_table method signature is unused. This parameter should either be used to selectively update a single table (which would be more efficient) or removed from the signature.

Copilot uses AI. Check for mistakes.
Copy link
Member

@jpn-- jpn-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks fine. We discussed (possibly while @i-am-sijia was out) that the skipping should not be unlimited; the idea is we want to skip problems when problems are few and probably not seriously consequential. If the problems are many there's little benefit of allowing a simulation to run to completion.


# trip_mode can be na if the run allows skipping failed choices and the trip mode choice has failed
if state.settings.skip_failed_choices:
return trips
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than just being OK here, code should:

  1. emit a warning that N choices have failed, and
  2. check if N exceeds configured failure thresholds (e.g. not more than 1% of all choices), in which case an error should be raised.

how="left",
)
if len(choices_df) > 0:
choices_df = choices_df[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check here how many choices are getting dropped, and

  1. emit a warning that N choices have failed, and
  2. check if N exceeds configured failure thresholds (e.g. not more than 1% of all households), in which case an error should be raised.


# print out number of households skipped due to failed choices
if state.settings.skip_failed_choices:
logger.info(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging level for this should be higher than "info"; at least warning if not "error" level.

f"Adjusting household expansion weights in {hh_weight_col} to account for {state.get('num_skipped_households', 0)} skipped households."
)
# adjust the hh expansion weights to account for skipped households
adjustment_factor = state.get_dataframe("households").shape[0] / (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the adjustment to expansion weights be based not on the number of skipped households, but instead on the original expansion weight total and the total expansion weight of the households we dropped? E.g. if we drop a household with a giant expansion weight that's a bigger thing than dropping a household with a smaller one.

locals_dict = {}
locals_dict.update(constants)
if state.settings.skip_failed_choices:
mask_skipped = trips_merged["household_id"].isin(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mask_skipped should be the same for trips_df and trips_merged, right? We only need to calculate the mask once.

.. versionadded:: 1.6
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need additional setting[s] to set thresholds for how many skips are OK and when it's too many and should be an error.

state.set("num_skipped_households", num_skipped_households)
state.set("skipped_household_ids", skipped_household_ids)

logger.debug(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger level here should be warn not debug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Under Review

Development

Successfully merging this pull request may close these issues.

2 participants